Genetic typing of the human cytochrome P450 2A6 gene and related methods

ABSTRACT

Disclosed are novel polymorphisms in the human cytochrome P450 2A6 gene and the use of those polymorphisms as predictive sequences for altered metabolism or occurrence of disease.

FIELD OF THE INVENTION

The present invention relates to the identification of variouspolymorphisms in the cytochrome P450 2A6 gene and methods and reagentsfor genotyping and phenotyping individuals using such polymorphisms.

BACKGROUND OF THE INVENTION

The mammalian liver contains enzymes that convert various chemicalcompositions to products which can more easily be eliminated from thebody. One enzyme system which plays a major role in determining the rateof elimination of these drugs is cytochrome P450. The cytochrome P450'sare among the major constituent proteins of the liver mixed functionmonooxygenases. They play a central role in the metabolism of steroids,the detoxification of drugs and xenobiotics, and the activation ofprocarcinogens. Without cytochrome P450 and related enzymes, naturallyoccurring and man-made foreign chemicals would accumulate in the body.Additionally, the biological effects of some chemicals are due solely tometabolites generated by cytochrome P450 and/or related enzymes.Metabolism by cytochrome P450 enzymes is often the rate-limiting step inpharmaceutical elimination. For example, most phase I metabolism ofdrugs and environmental pollutants is performed by cytochrome P450enzymes. In this process, one or more water-soluble groups (such ashydroxyl) are introduced into the fat-soluble parent molecule, therebyrendering it vulnerable to attack by the phase II conjugating enzymes.The increased water-solubility of phase I and especially phase IIproducts permits ready excretion. Consequently, factors that lessen theactivity of cytochrome P450 enzymes usually prolong the effects ofpharmaceuticals, whereas factors that increase cytochrome P450 activityhave the opposite effect.

The phenobarbital-inducible P450 gene, CYP2A6 or CYP4502A6, is a memberof a multigene family located on by human chromosome 19. Induction byphenobarbital is mediated almost entirely at the level of transcription.P450 enzymes, as well as other so called “drug-metabolizing” enzymes,play an important role in maintaining the steady-state levels ofendogenous ligands involved in ligand-modulated transcription of geneseffecting homeostasis, growth, differentiation, and neuroendocrinefunctions.

Genetic polymorphisms of cytochrome P450 enzymes result insubpopulations of individuals that are distinct in their ability toperform particular drug biotransformation reactions. These phenotypicdistinctions have important implications for selection of drugs. Forexample, a drug that is safe when administered to the majority of humansmay cause intolerable side-effects in an individual suffering from adefect in a cytochrome P450 enzyme required for detoxification of thedrug. Alternatively, a drug that is effective in most humans may beineffective in a particular subpopulation because of the lack of aparticular cytochrome P450 enzyme required for conversion of the drug toa metabolically active form. Accordingly, it is important for both drugdevelopment and clinical use to screen drugs to determine whichcytochrome P450 enzymes are required for activation and/ordetoxification of the drug.

It is also important to identify those individuals who are deficient ina particular P450 enzyme. This type of information has been used toadvantage in the past for developing genetic assays that predictphenotype and thus predict an individual's ability to metabolize a givendrug. Information such as this would be of particular value indetermining the likely side effects and therapeutic failures of variousdrugs and routine phenotyping could be recommended for certaincategories of patients.

Wood and Conney, Science, 1974, vol. 185, pages 612-614, found thatbasal and phenobarbital-induced rates of hepatic metabolism of coumarinto 7-hydroxycoumarin were markedly higher in DBA-2J mice than in otherstrains and that intermediate activities in hybrids indicated codominantinheritance. They suggested that there could be similar variability inman. Kratz, Europ. J. Clin. Pharm., 1976, vol. 10, pages 133-137,studied coumarin 7-hydroxylase activity in human liver obtained byneedle biopsy. A 4-fold range of enzymatic activity was observed andKratz suggested that the difference was due to genetic differencesbetween sample donors. Kratz excluded individuals taking drugs thatmight induce enzyme activity from the study. Yamano et al.,Biochemistry, 1990, vol. 29, pages 1322-1329, reported a variant alleleof the CYP2A6 gene termed *2 that had a single nucleotide substitutionthat resulted in an amino acid substitution of a histidine for a leucineat position 160. The variant allele was found to encode an unstable andcatalytically inactive enzyme. Femandez-Salguero et al., Am. J. Hum.Genet., 1995, vol. 57, pages 651-660, reported the genomic sequence forthe CYP2A6, CYP2A7, and CYP2A13 genes, in addition to 2 pseudogenestruncated after exon 5, located on 19q13.2. They also identified threedifferent CYP2A6 alleles: the functional CYP2A6 allele, referred to as*1; the variant-1 allele that had a single base mutation of a T to an Aresulting in a substitution of a histidine for a leucine in exon 3,referred to as *2; and the variant-2 allele which was formed by geneconversion between the wildtype CYP2A6 and CYP2A7 genes in exons 3, 6,and 8, referred to as*3.

Four different deletion mutants resulting in an absence of enzymeactivity have been described by prior investigators. Oscarson et al.,FEBS Lett., 1999, vol. 448, pages 105-110), described the structure of anovel CYP2A locus, referred to as *4A, in which the entire CYP2A6 genehad been deleted thereby disrupting CYP2A6-dependent metabolism. Theyproposed that this allele was generated by an unequal crossover eventbetween the 3-prime flanking region of the CYP2A6 and CYP2A7 genes. A“D-type” deletion mutant lacking the CYP2A6 gene region from intron 5 toexon 9, referred to as *4B, was described by Nonoya et al.,Pharmacogenetics, 1998, vol.8, pages 239-249. An “E-type” mutantreferred to as *4C was also identified by Nonoya et al., J Pharmacol ExpTher., 1999, vol.289, pages 437-442 in which exons 1, 8, and 9 of CYP2A6gene were deleted. Oscarson et al., FEBS Lett., 1999, vol.460, pages321-327, identified a fourth type of deletion mutant referred to as *4Dthat they suggested resulted from unequal crossover event with ajunction at either intron 8 or exon 9. In addition to characterizing thenew deletion mutant CYP2A6*4D, Oscarson et al. also reported a newvariant referred to as *5 that was a single nucleotide change of G to Tat position 1436, resulting in a substitution of a valine for a glycineat codon 479. This variant allele resulted in a poor metabolizerphenotype. In addition, they found a new wild type variant referred toas * 1 B that resulted from a gene conversion in the 3′ flanking regionof the CYP2A6 gene.

It has been established in the art that nicotine is inactivated byc-oxidation to cotinine. Tyndale, PCT Publication No. WO 98/03171,published Jan. 29, 1998, disclosed that inhibitors of the enzyme encodedby the CYP2A6 gene cause a decrease in nicotine metabolism. It has beensuggested in the art that the enzyme encoded by the CYP2A6 gene mayaffect smoking patterns by mediating the metabolism of nicotine (Vineiset al., in Metabolic Polymorphisms and Susceptibility to Cancer, IARCScientific Publication No. 148, 1999). Pianezza et al., Nature, 1998,vol. 393, page 750, disclosed that smokers carrying two null CYP2A6alleles consumed fewer cigarettes. Oscarson et al., FEBS Lett., 1998,vol. 438, pages 201-205, however, indicated that Pianezza et al. used anerroneous method to measure the association of the genotype to thephenotype and therefore additional studies need to be performed tocorrectly determine the true phenotype of individuals that aregenetically CYP2A6 defective. London et al., Lancet, 1999, vol. 353,pages 898-899, also disclosed that polymorphism in the CYP2A6 gene haslittle influence on the propensity to smoke cigarettes. Seller andTyndale, PCT Publication No. WO 99/27919, published Jun. 10, 1999,disclosed that the presence of the *2 and *3 mutant alleles of CYP2A6are related to whether an individual becomes a smoker or if already asmoker, then the number of cigarettes that person smokes. Seller andTyndale conclude that the CYP2A6 genotype directly influences the riskfor tobacco dependence. Genotyping methods using variants of the CYP2A6gene have been suggested by prior investigators (e.g., Kitagawa et al.,Biochem Biophyis Res Comm, 1999, vol. 262, pages 146-151).

None of the previous investigators, however, have identified thepolymorphisms of the present invention and their associated any geneticvariation with the susceptibility or occurrence of inflammation, asthmaor habitual smoking.

There still remains a need in the art to identify polymorphisms in theCYP2A6 gene that have predictive value for altered metabolism oroccurrence of disease.

SUMMARY OF THE INVENTION

The present invention relates to novel polymorphisms located in thehuman CYP2A6 gene and the use of those polymorphisms as predictivesequences for altered metabolism or occurrence of disease. According tothe present invention there are provided CYP2A6 polymorphic nucleic acidsequences and methods to use such nucleic acid sequences, in particularfor diagnostic purposes to identify individuals having a polymorphicgenotype.

One embodiment of the present invention includes an isolated nucleicacid molecule having a nucleic acid sequence selected from the groupconsisting of SEQ ID NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ IDNO:16, SEQID NO:18, and SEQ ID NO:20 and nucleic acid sequences that are fullycomplementary thereto. Another embodiment of the present inventionincludes an isolated nucleic acid molecule that comprises at least onebase variation from that of a known human P450 sequence, wherein thenucleic acid molecule is selected from the group consisting of: (a) anucleic acid molecule that comprises a T for a C at position 202 of SEQID NO:21 and at least 20 other bases of SEQ ID NO:21 contiguouslyappurtenant to said position; (b) a nucleic acid molecule whichcomprises a C for a T at position 369 of SEQ ID NO:21 and at least 20other bases of SEQ ID NO:21 contiguously appurtenant to said position;(c) a nucleic acid molecule which comprises an A for a G at position 394of SEQ ID NO:21 and at least 20 other bases of SEQ ID NO:21 contiguouslyappurtenant to said position; (d) a nucleic acid molecule whichcomprises an A for a C at position 413 of SEQ ID NO:21 and at least 20other bases of SEQ ID NO:21 contiguously appurtenant to said position;(e) a nucleic acid molecule which comprises a G for a T at position 743of SEQ ID NO:21 and at least 20 other bases of SEQ ID NO:21 contiguouslyappurtenant to said position; (f) a nucleic acid molecule whichcomprises an A for a G at position 841 of SEQ ID NO:21 and at least 20other bases of SEQ ID NO:21 contiguously appurtenant to said position;and (g) a nucleic acid molecule which is fully complementary to anucleic acid molecule of (a)-(f).

Further embodiments of the invention include various methods foridentifying polymorphisms. One such method is a method for identifying apolymorphism in a nucleic acid molecule of an individual which includesdetermining whether a nucleic acid sequence selected from SEQ ID NO:10,SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:18, and SEQ ID NO:20or a nucleic acid sequence that is fully complementary thereto ispresent in the nucleic acid molecule. Two other such methods include amethod for evaluating an individual's risk of developing asthma and amethod for evaluating an individual's propensity for cigaretteconsumption. These methods include obtaining a nucleic acid moleculesample from said individual. The methods further include determiningwhether a polymorphism in a nucleic acid sequence of the gene encodingcoumarin 7-hydroxylation protein is present in the nucleic acid sample,wherein the polymorphism is selected from: a T for C substitutioncorresponding to position 202 of SEQ ID NO:21; a C for T substitutioncorresponding to position 369 of SEQ ID NO:2 1; an A for G substitutioncorresponding to position 394 of SEQ ID NO:21; an A for C substitutioncorresponding to position 413 of SEQ ID NO:21; a G for T substitutioncorresponding to position 743 of SEQ ID NO:21; and an A for G atposition 841 of SEQ ID NO:21

The methods of the present invention can further include determiningwhether an individual is homozygous or heterozygous for a given nucleicacid sequence. Such methods can be either a cDNA assay and a genomic DNAassay. Such methods can also include a step of digesting a nucleic acidmolecule with a restriction enzyme that distinguishes between apolymorphic nucleic acid sequence and the corresponding wildtypesequence. Further, the methods can include amplifying a selected regionof the nucleic acid molecule of the individual.

Additional embodiments of the present invention include kits forconducting the various methods. Such kits can include nucleic acidmolecules of the present invention, as well as restriction enzymesuseful in the methods.

Further embodiments of the present invention include a computer fordisplaying nucleic acid sequence of a molecules of the presentinvention. Such a computer includes a computer-readable medium encodedwith the nucleic acid sequence, to create an electronic file. Thecomputer further includes hardware and software that display the nucleicacid sequence in the electronic file as a linear model of the moleculefor analysis, alignment with other sequences or visualization of thenucleic acid sequence

A further embodiment of the present invention is an isolated nucleicacid molecule comprising a nucleic acid sequence selected from SEQ IDNO:21 and a nucleic acid sequence that is fully complementary to SEQ IDNO:21.

A still further embodiment of the present invention is an isolatednucleic acid molecule having a nucleic acid sequence consistingessentially of a nucleic acid sequence selected from SEQ ID NO:9, SEQ IDNO:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:17, and SEQ ID NO:19 andnucleic acid sequences that are fully complementary thereto.

BRIEF DESCRIPTION OF THE FIGURE

FIG. 1 illustrates the amount of luciferase activity detected using celllines transfected with expression vector containing the wildtype CYP2A6,mutant CYP2A6 or no insert.

DETAILED DESCRIPTION OF THE INVENTION

The present invention relates to compositions that contain certaingenetic characteristics and methods that reveal the presence or absenceof such characteristics. The present invention includes theidentification of different genetic polymorphisms in the cytochrome P4502A6 (CYP4502A6 or CYP2A6) gene. The presence or absence of thepolymorphism at one or more of these sites has been found to beprognostic or diagnostic for inflammation, asthma or smoking. Nucleicacid molecules comprising the polymorphic sequences are used to screenindividuals for altered metabolism for CYP2A6 substrates, potentialdrug-drug interactions, drug adverse side-effects, inflammation, asthma,susceptibility to habitual smoking, and diseases that result fromenvironmental or occupational exposure to dangerous substances.

It is to be understood that the inventions disclosed herein are notlimited to the particular methodology, protocols, cell lines, animalspecies or genera, constructs and reagents described, and as such mayvary. It is also to be understood that the terminology used herein isfor the purpose of describing particular embodiments only, and is notintended to limit the scope of the present invention which will belimited only by the appended claims.

For the purposes of the present invention, the term “a” or “an” entityrefers to one or more of that entity; for example, “a protein” or “anucleic acid molecule” refers to one or more of those compounds or atleast one compound. As such, the terms “a” (or “an”), “one or more” and“at least one” can be used interchangeably herein. It is also to benoted that the terms “comprising”, “including”, and “having” can be usedinterchangeably. Furthermore, a compound “selected from the groupconsisting of” refers to one or more of the compounds in the list thatfollows, including mixtures (i.e., combinations) of two or more of thecompounds.

According to the present invention, reference to an “isolated nucleicacid molecule” refers to a nucleic acid molecule which is the size of orsmaller than a gene. Thus, an isolated nucleic acid molecule does notencompass isolated genomic DNA or an isolated chromosome. The termisolated nucleic acid molecule does not connote any specific minimumlength. It should also be appreciated that reference to an isolatednucleic acid molecule does not necessarily reflect the extent of purityof the nucleic acid molecule. An isolated nucleic acid molecule of thepresent invention can be obtained from a natural source, such as atissue sample, or it can be produced using molecular biology techniques,such as by PCR amplification, or it can be produced by chemicalsynthesis. “Allele” has the meaning which is commonly known in the art,that is, a genomic variant of a referent gene, including variants,which, when translated result in functional or dysfunctional (includingnon-existant) gene products. The first identified allelic form isarbitrarily designated as the reference form and other allelic forms aredesignated as alternative or variant alleles. The allelic form occurringmost frequently in a selected population is sometimes referred to as thewildtype form.

“Contiguously appurtenant to” means any bases flanking the referentposition, including the instances of all bases selected 5′ to thereferent position and no bases selected 3′ to the referent position; allbases selected 3′ to the referent position and no bases selected 5′ tothe referent position; and some bases selected 5′ and some basesselected 3′ to the referent position. The term is intended to mean thatthe selected bases necessarily must be in the same sequential order asdescribed in the referent sequence, with the exception of the variantbase at the referent position.

“For the purpose of determining genotype” means that one of the purposesis to determine genotype, not necessarily that the end goal or use ofthe information is to determine genotype. For instance, “for the purposeof determining genotype” includes the use of the information todetermine genotype for the ultimate goal of determining probability ofnegative or positive drug interactions.

“Gene” has the meaning that is commonly-known in the art, that is, anucleic acid sequence that includes the translated sequences that codefor a protein (“exons”) and the untranslated intervening sequences(“introns”), and any regulatory elements ordinarily necessary totranslate the protein.

“Genotype” has the meaning that is commonly-known in the art, that is, aphysical description of a nucleic acid sequence.

“Hybridization” has the meaning that is commonly-known in the art, thatis, the formation of a duplex structure by two single-stranded nucleicacids due to complementary base pairing. Hybridization can occur betweenexactly complementary nucleic acid strands or between nucleic acidstrands that contain some regions of mismatch.

“Polymorphism” means a polymorphism wherein the group exists by virtueof a difference in identity of one or more nucleotides at given sequencelocations. The location of nucleotide identity differences is usuallypreceded by and followed by highly conserved sequences (e.g., sequencesthat vary in less than {fraction (1/100)} or {fraction (1/1000)} membersof the populations). However, more than one single nucleotidepolymorphism can exist between or among the group members. A“transition” is the replacement of one purine by another purine or onepyrimidine by another pyrimidine. A “transversion” is the replacement ofa purine by a pyrimidine or vice versa. Single nucleotide polymorphismscan also arise from a deletion of a nucleotide or an insertion of anucleotide relative to a given sequence location.

“Stringent hybridization” means that which is commonly-known in the art,that is, at a salt concentration of no more than IM and a temperature ofat least 25 degrees Celsius. For example, conditions of 5×SSPE (750 mMNaCl, 50 mM Sodium Phosphate, 5 mM EDTA, pH 7.4) and a temperature of 55degrees to 60 degrees Celsius are suitable.

In the present invention, alleles are expressed by symbols in accordancewith definitions given by IUPAC-IUB and common names or common usage inthe art.

The wildtype CYP2A6 gene encodes an enzyme called coumarin 7-hydroxylaseprotein.

One embodiment of the present invention is an isolated nucleic acidmolecule comprising a nucleic acid sequence selected from the groupconsisting of: a nucleic acid sequence comprising a nucleic acidsequence selected from the group consisting of SEQ ID NO:10, SEQ IDNO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:18, and SEQ ID NO:20; and anucleic acid sequence that is fully complementary to such a nucleic acidsequence. In accordance with the present invention, an isolated nucleicacid molecule is a nucleic acid molecule that has been removed from itsnatural milieu (i.e., that has been subject to human manipulation) andcan include DNA, RNA, or derivatives of either DNA or RNA. An isolatedCYP2A6 nucleic acid molecule of the present invention can be isolatedfrom its natural source or can be produced using recombinant DNAtechnology (e.g., polymerase chain reaction (PCR) amplification,cloning) or chemical synthesis. The CYP2A6 nucleic acid molecules of thepresent invention are isolated and obtained in substantial purity,generally as other than an intact chromosome. Usually, the nucleic acidmolecule will be obtained substantially free of other nucleic acidsequences that do not include a CYP2A6 sequence or fragment thereof,generally being at least about 50%, usually at least about 90% pure.Although the phrase “nucleic acid molecule” primarily refers to thephysical nucleic acid molecule and the phrase “nucleic acid sequence”primarily refers to the sequence of nucleotides on the nucleic acidmolecule, the two phrases can be used interchangeably

The nucleic acid sequence of the CYP2A6 genomic gene is generally knownin the art and accessible in public databases as cited above. Forexample, GenBank Accession No. U22027 identifies the human CYP4502A6gene, although it contains some errors, as discussed below more fully inthe Examples section. The sequence is useful as a reference for thegenomic location of a polymorphism within the CYP2A6 gene or forspecific CYP2A6 coding region sequences. As used herein, the term“CYP2A6 gene” is intended to refer to both the wildtype and polymorphicsequences, unless specifically denoted otherwise. Nucleic acids ofparticular interest comprise the provided polymorphic sequences. It iswithin the skill of one in the art to identify the location of apolymorphic sequence of the present invention using wildtype CYP2A6genomic or cDNA sequences known in the art. A skilled artisan can use apolymorphic sequence, its corresponding wildtype sequence and the CYP2A6sequence contiguously appurtenant to the referenced polymorphismprovided in Table 3 with a known genomic sequence or cDNA sequence todetermine the position of the polymorphism. It is within the scope ofthe invention that a polymorphism includes detection at the designatedgenomic sequence nucleotide position or its corresponding copy DNA(cDNA) position if the polymorphism is located within the coding regionof the CYP2A6 sequence.

In accordance with the present invention, the polymorphisms of theCYP2A6 sequence occur at nucleotide −580, −413, −388, −369 or −39 of thepromoter region of the CYP2A6 genomic sequence or nucleotide 51 of exon1 of the CYP2A6 genomic sequence, with reference to the positions shownin the wild type sequence of SEQ ID NO.21, wherein positions 791-793 ofSEQ ID NO:21 are nucleotides 1-3 of exon 1 (the initiation codon), andposition 781, the transcription starting point, is nucleotide −1 of thepromoter region. By identifying the location of the polymorphism atnucleotide 51 of the genomic sequence, it is within the skill of one inthe art to determine the corresponding nucleotide number designation inthe coding region of a cDNA sequence encoding coumarin 7-hydroxylaseprotein. These same polymorphisms correspond to nucleotide positions202, 369, 394,413, 743 and 841, respectively, of SEQ ID NO:21. For thepurposes of identification in this application, the positions of thepolymorphisms of the present invention will be referenced as nucleotidepositions 202, 369, 394, 413, 743 or 841.

In the case of positions −580, −413, −388, 31 369 or −39 of the promotorregion or nucleotide 51 of exon 1, the polymorphism is typically one ormore base pair substitutions such as C to T, T to C, G to A, C to A, Tto G or G to A, respectively. The polymorphisms are silent and thus, thepolymorphisms in the promoter region do not affect functioning of thepromoter and the polymorphism in exon 1 does not result in an amino acidsubstitution.

Another embodiment of the present invention is an isolated nucleic acidmolecule that comprises at least one base variation from that of a knownhuman P450 sequence, wherein said nucleic acid molecule is selected fromthe group consisting of: (a) a nucleic acid molecule that comprises a Tfor a C at position 202 of SEQ ID NO: 21 and at least 20 other bases,alternatively at least 30 other bases, at least 40 other bases or atleast 50 other bases, of SEQ ID NO:21 contiguously appurtenant to saidposition; (b) a nucleic acid molecule which comprises a C for a T atposition 369 of SEQ ID NO:21 and at least 20 other bases, alternativelyat least 30 other bases, at least 40 other bases or at least 50 otherbases, of SEQ ID NO:21 contiguously appurtenant to said position; (c) anucleic acid molecule which comprises an A for a G at position 394 ofSEQ ID NO:21 and at least 20 other bases, alternatively at least 30other bases, at least 40 other bases or at least 50 other bases, of SEQID NO:21 contiguously appurtenant to said position; (d) a nucleic acidmolecule which comprises an A for a C at position 413 of SEQ ID NO:21and at least 20 other bases, alternatively at least 30 other bases, atleast 40 other bases or at least 50 other bases, of SEQ ID NO:21contiguously appurtenant to said position; (e) a nucleic acid moleculewhich comprises a G for a T at position 743 of SEQ ID NO:21 and at least20 other bases, alternatively at least 30 other bases, at least 40 otherbases or at least 50 other bases, of SEQ ID NO:21 contiguouslyappurtenant to said position; (f) a nucleic acid molecule whichcomprises an A for a G at position 841 of SEQ ID NO:21 and at least 20other bases, alternatively at least 30 other bases, at least 40 otherbases or at least 50 other bases, of SEQ ID NO:21 contiguouslyappurtenant to said position; or (g) a nucleic acid which is fullycomplementary to a nucleic acid molecule of (a) through (f). In thisembodiment, the isolated nucleic acid molecule can be defined, in part,by comprising a nucleic acid sequence selected from SEQ ID NO:10, SEQ IDNO:12, SEQ ID NO: 14, SEQ ID NO:16, SEQ ID NO:18, or SEQ ID NO:20.

Preferred CYP2A6 nucleic acid molecules include nucleic acid moleculeshaving a nucleic acid sequence that is at least about 80%, morepreferably at least about 85%, more preferably at least about 90%, morepreferably at least about 95%, and more preferably at least about 98%identical to nucleic acid sequence SEQ ID NO:10, SEQ ID NO:12, SEQ IDNO:14, SEQ ID NO:16, SEQ ID NO:18, and/or SEQ ID NO:20.

As used herein, unless otherwise specified, reference to a percent (%)identity refers to an evaluation of homology which is performed using:(1) a BLAST 2.0 Basic BLAST homology search(http://www.ncbi.nlm.nih.gov/BLAST) using blastn for nucleic acidsearches with standard default parameters, wherein the query sequence isfiltered for low complexity regions by default (described in Altschul,S. F., Madden, T. L., Sch{umlaut over (aa)}ffer, A. A., Zhang, J.,Zhang, Z., Miller, W. & Lipman, D. J. (1997) “Gapped BLAST andPSI-BLAST: a new generation of protein database search programs.”Nucleic Acids Res. 25:3389-3402, incorporated herein by reference in itsentirety); (2) a BLAST 2 alignment (using the parameters describedbelow) (http://fwww.ncbi.nlm.nih.gov/BLAST); or (3) both BLAST 2.0 andBLAST 2. It is noted that due to some differences. in the standardparameters between BLAST 2.0 Basic BLAST and BLAST 2, two specificsequences might be recognized as having significant homology using theBLAST 2 program, whereas a search performed in BLAST 2.0 Basic BLASTusing one of the sequences as the query sequence may not identify thesecond sequence in the top matches. Therefore, it is to be understoodthat percent identity can be determined by using either one or both ofthese programs.

Two specific sequences can be aligned to one another using BLAST 2sequence as described in Tatusova and Madden, (1999), “Blast 2sequences—a new tool for comparing protein and nucleotide sequences”,FEMS Microbiol Lett. 174:247-250, incorporated herein by reference inits entirety. BLAST 2 sequence alignment is performed in blastn usingthe BLAST 2.0 algorithm to perform a Gapped BLAST search (BLAST 2.0)between the two sequences allowing for the introduction of gaps(deletions and insertions) in the resulting alignment. For purposes ofclarity herein, a BLAST 2 sequence alignment is performed using thestandard default parameters as follows.

For blastn, using 0 BLOSUM62 matrix:

-   -   Reward for match=1    -   Penalty for mismatch=−2    -   Open gap (5) and extension gap (2) penalties    -   gap x₁₃ dropoff(50) expect (10) word size (11) filter (on)

In some embodiments, as indicated, to align and calculate the percentidentity 15 between two amino acid sequences, theMartinez/Needleman-Wunsch DNA alignment method is used. This method isprovided by the Lasergene MegAlign, a module within the DNASTAR program(DNASTAR, Inc., Madison, Wis.), and the standard default parameters areused as follows:

-   -   (1) Minimum match=9;    -   (2) Gap penalty=1.10;    -   (3) Gap length penalty=0.33.

Another preferred nucleic acid molecule of the present inventionincludes at least a portion of nucleic acid sequence SEQ ID NO:10, SEQID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:18, and/or SEQ ID NO:20,that is capable of hybridizing to a CYP2A6 gene and includes of anallelic variation of the wild type CYP2A6 gene. A more preferred nucleicacid molecule includes the nucleic acid sequence SEQ ID NO:10, SEQ IDNO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:18, and/or SEQ ID NO:20.Such nucleic acid molecules can include nucleotides in addition to thoseincluded in the SEQ ID NOs, such as, but not limited to, a full-lengthgene or a full-length coding region.

The present invention also includes nucleic acid molecules that areoligonucleotides capable of hybridizing, under stringent hybridizationconditions, with complementary regions of other, preferably longer,nucleic acid molecules of the present invention such as those comprisingCYP2A6 genes or other CYP2A6 nucleic acid molecules. Oligonucleotides ofthe present invention can be RNA, DNA, or derivatives of either. Theminimum size of such oligonucleotides is the size required for formationof a stable hybrid between an oligonucleotide and a complementarysequence on a nucleic acid molecule of the present invention. Minimalsize characteristics are disclosed herein. The present inventionincludes oligonucleotides that can be used as, for example, probes toidentify nucleic acid molecules or primers to produce nucleic acidmolecules. Also provided are oligonucleotides that can be used asprimers to amplify DNA from a variant or a wildtype CYP2A6 nucleic acidmolecule. Preferred oligonucleotide probes or primers include a singlebase change of a polymorphism of the present invention or the wildtypenucleotide that is located at the same position. Preferably thenucleotide of interest occupies a central position of a probe.Preferably the nucleotide of interest occupies a 3′ position of aprimer.

The minimal size of a nucleic acid molecule of the present invention isa size capable of forming a stable hybrid (i.e., hybridize understringent hybridization conditions) with the complementary sequence of anucleic acid molecule encoding a coumarin 7-hydroxylase natural protein.As such, the size of the nucleic acid molecule is dependent on nucleicacid composition and percent homology between the nucleic acid moleculeand complementary sequence. It should also be noted that the extent ofhomology required to form a stable hybrid can vary depending on whetherthe homologous sequences are interspersed throughout the nucleic acidmolecules or are clustered (i.e., localized) in distinct regions on thenucleic acid molecules. The minimal size of such nucleic acid moleculesis typically at least about 15 to about 18 bases in length. Unlessspecified otherwise, there is no limit, other than a practical limit, onthe maximal size of such a nucleic acid molecule in that the nucleicacid molecule can include a portion of a gene, an entire gene, multiplegenes, or portions thereof. In preferred embodiments, however, nucleicacid molecules of the present invention are typically less than about 5kilobases in length and more preferably less than about 70 nucleotidesin length. For instance, the present invention includes human CYP2A6alleles that comprise base pair changes as described herein, havingappurtenant sequences, based on either SEQ ID NO:21 or GenBank AccessionNo. U22027, of 10, 15, 20, 25, 30, 35, 45, 50, 55, 60, 65, 70, 75, 80,85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150,155,160,165, 170, 175, 180, 185, 190, 195, 200,250, 300, 350, 400, 450, 500,or 1000 bases, or any whole number encompassed by the range of10-10,000.

As used herein, hybridization conditions refer to standard hybridizationconditions under which nucleic acid molecules are used to identifysimilar nucleic acid molecules. Such standard conditions are disclosed,for example, in Sambrook et al., Molecular Cloning: A Laboratory Manual,Cold Spring Harbor Labs Press, 1989. Sambrook et al., ibid., isincorporated by reference herein in its entirety (see specifically,pages 9.31-9.62). In addition, formulae to calculate the appropriatehybridization and wash conditions to achieve hybridization permittingvarying degrees of mismatch (e.g., 80%, 85%, 90%, 95%, or 98%) ofnucleotides are disclosed, for example, in Meinkoth et al., 1984, Anal.Biochem. 138, 267-284; Meinkoth et al., ibid., is incorporated byreference herein in its entirety.

The genotype of an individual is determined with respect to the providedCYP2A6 gene polymorphisms. The genotype is useful for determining thepresence of phenotypically evident polymorphism, and for determining thelinkage of a polymorphism to a phenotypic change.

One embodiment of the present invention is a method of identifying asample containing a nucleic acid molecule that comprises a wildtype orvariant allele, the method comprising identifying the presence orabsence of one or more polymorphisms in a sequence of a gene that iscapable of encoding coumarin 7-hydroxylase.

Another embodiment of the present invention is a method for identifyingwhether a sample containing a nucleic acid molecule is associated withinflammation and/or asthma, the method comprising identifying thepresence or absence of one or more CYP2A6 alleles, wherein the patternof alleles is indicative of inflammation and/or asthma.

Another embodiment of the present invention is a method of identifying asample containing a nucleic acid molecule that is associated withinflammation and/or asthma, the method comprising identifying thepresence or absence of a polymorphism in the nucleic acid sequenceencoding coumarin 7-hydroxylase protein, wherein said polymorphism isindicative of protection against developing inflammation and/or asthma.

Another embodiment of the present invention is a method of identifying asample containing a nucleic acid molecule is associated with theoccurrence of smoking, the method comprising identifying the presence orabsence of a polymorphism in the nucleic acid sequence encoding coumarin7-hydroxylase protein, wherein said polymorphism is prognostic forhabitual smoking.

The invention provides a variety of assays for identifying individualshaving one or more wild type or variant alleles. The assays identifypolymorphisms in CYP2A6 cDNA or CYP2A6 genomic DNA (i.e., including theentire CYP2A6 gene and not just the coding region), which is theprincipal human determinant of coumarin 7-hydroxylase activity. Suchassays are referred to herein as “cDNA assays” and “genomic DNA assays.”It should be noted that genomic DNA assays include not only analysis ofactual genomic DNA derived from a natural source, but also analysis ofany amplification product or other derivative (e.g., restrictionfragments) of genomic DNA derived from a natural source. The cDNA assaysare particularly useful for de novo localization of a CYP2A6polymorphism to a particular nucleotide or nucleotides. The genomicassays are particularly useful for rapid screening of individuals forthe presence of a polymorphism.

Many of the diagnostic assays rely on amplification of part or all of aCYP2A6 nucleic acid molecule. In one embodiment, portions of a CYP2A6nucleic acid molecule are amplified by the polymerase chain reaction(PCR). The PCR process is described in e.g., U.S. Pat. Nos. 4,683,195;4,683,202; and 4,965,188; PCR Technology:Principles and Applications forDNA Amplification (ed. Erlich, Freeman Press, New York, N.Y., 1992); PCRProtocols: A Guide to Methods and Applications (eds. Innis et al.,Academic Press, San Diego, Calif. (1990); Mattila et al. Nucleic AcidsRes. 19:4967 (1991); Eckert & Kunkel PCR Methods and Applications 1:17(1991); PCR (eds. McPherson et al., IRL Press, Oxford), each of which isincorporated by this reference in its entirety.

To amplify a portion of a CYP2A6 nucleic acid molecule in a sample byPCR, the sequence must be accessible to the components of theamplification system. Accessibility can be achieved by isolating nucleicacid molecules from the sample. A variety of techniques for extractingnucleic acid molecules from biological samples are known in the art.Alternatively, if the sample is fairly readily disruptable, the nucleicacid need not be purified prior to amplification by the PCR technique,i.e., if the sample is comprises cells, particularly peripheral bloodlymphocytes or monocytes, lysis and dispersion of the intracellularcomponents may be accomplished merely by suspending the cells inhypotonic buffer. See Han et al., Biochemistry, 1987, vol. 26, pages1617-1625. Polymorphisms are detected in a nucleic acid molecule from anindividual being analyzed. For assay of genomic DNA, virtually anybiological sample (other than pure red blood cells) is suitable.Examples of convenient tissue samples include whole blood, semen,saliva, tears, urine, fecal material, sweat, buccal, skin and hair.Nucleic acid molecules can be obtained according to procedureswell-known in the art.

For amplification of mRNA sequences, a first step is the synthesis of aDNA copy (cDNA) of the region to be amplified by reverse transcription.Reverse transcription is the polymerization of deoxynucleosidetriphosphates to form primer extension products that are complementaryto a ribonucleic acid template. The process is effected by reversetranscriptase, an enzyme that initiates synthesis at the 3′-end of theprimer and proceeds toward the 5′-end of the template until synthesisterminates. Examples of suitable polymerizing agents that convert theRNA nucleic acid molecule into a complementary, copy-DNA (cDNA) sequenceare avian myeloblastosis virus reverse transcriptase and Thermusthermophilous DNA polymerase. Reverse transcription can be carried outas a separate step, or in a homogeneous reverse transcription-polymerasechain reaction (RT-PCR). Polymerizing agents suitable for synthesizing acDNA sequence from the RNA template are reverse transcriptase (RT), suchas avian myeloblastosis virus RT, Moloney murine leukemia virus RT, orThermus thermophilous DNA polymerase.

Primers for PCR amplification are designed so that the position at whicheach primer hybridizes along a duplex sequence is such that an extensionproduct synthesized from one primer, when separated from the template(complement), serves as a template for the extension of the otherprimer. The primers are selected to be substantially complementary tothe different strands of each specific sequence to be amplified. Thismeans that the primers must be sufficiently complementary to hybridizewith their respective strands. Therefore, the primer sequence need notreflect the exact sequence of the template. For example, anon-complementary nucleotide fragment may be attached to the 5′ end ofthe primer with the remainder of the primer sequence being complementaryto the strand. Alternatively, complementary bases or longer sequencescan be interspersed into the primer, provided that the primer sequencehas sufficient complementarity with the sequence of the strand to beamplified to hybridize therewith and thereby form a template forsynthesis of the extension product of the other primer. Paired primersfor amplification of a given segment of DNA are designated forward andreverse primers. The forward primer hybridizes to a double-stranded DNAmolecule at a position 5′, or upstream, from the reverse primer. Theforward primer hybridizes to the complement of the coding strand of thedouble stranded sequence, i.e., the antisense strand, and the reverseprimer hybridizes to the coding strand.

The appropriate length of a primer depends on the intended use of theprimer but typically ranges from about 10 to about 100, preferably about15 to about 50, more preferably about 15 to about 35, or more preferablyabout 20 to about 30 nucleotides in length. The spacing of primersdetermines the length of segment to be amplified. The spacing is notusually critical and amplified segments can range in size from about 25bases to at least about 35 kilobases in length. Segments from about 25to about 2000, preferably about 50 to about 1000, more preferably about100 to about 500 nucleotides in length are typical.

A primer can be labeled, if desired, by incorporating a label detectableby spectroscopic, photochemical, biochemical, immunochemical, orchemical means. For example, useful labels include .sup.32 P,fluorescent dyes, electron-dense reagents, enzymes (as commonly used inan ELISA), biotin, or haptens and proteins for which antisera ormonoclonal antibodies are available. A label can also be used to“capture” the primer, so as to facilitate the immobilization of eitherthe primer or a primer extension product, such as amplified DNA, on asolid support.

Other suitable amplification methods include the ligase chain reaction(LCR) (see Wu and Wallace, Genomics, 1989, vol. 4, pages 560-569;Landegren et al., Science, 1988, vol. 241, pages 1077-1080;transcription amplification (Kwoh et al., Proc. Natl. Acad. Sci. USA,1989, vol. 86, pages 1173-1177), and self-sustained sequence replication(Guatelli et al., Proc. Nat. Acad. Sci. USA, 1990, vol. 87, pages1874-1878) and nucleic acid based sequence amplification (NASBA). Thelatter two amplification methods involve isothermal reactions based onisothermal transcription, which produce both single stranded RNA (ssRNA)and double stranded DNA (dsDNA) as the amplification products in a ratioof about 30 or 100 to 1, respectively.

An allele-specific primer can be used in a PCR amplification. Theallele-specific primer hybridizes to a site on a nucleic acid moleculethat overlaps with a polymorphism and extension will only occur if anallelic form complementary to the primer is present. See Gibbs, NucleicAcid Res., 1989, vol. 17, pages 2427-2448. This primer is used inconjunction with a second primer which hybridizes at a distal site.Amplification proceeds from the two primers leading to a detectableproduct signifying the particular allelic form is present. Thus, thepresence or absence of an amplification product is detected usingstandard methods. Controls can be used that test the efficacy of theamplification reaction itself or that allow the experimental results tobe compared with known wildtype or polymorphic CYP2A6 nucleic acidmolecule samples. The method works best when the mismatch is included inthe 3′-most position of the oligonucleotide aligned with thepolymorphism because this position is most destabilizing to elongationfrom the primer.

Sample nucleic acid molecules, isolated directly from cells, amplifiedor cloned fragments, can also be analyzed by a number of other methodsknown in the art. The nucleic acid molecule can be sequenced by usingeither the dideoxy chain termination method or other methods (see forexample Sambrook et al., Molecular Cloning, A Laboratory Manual (2ndEd., CSHP, New York 1989); Zyskind et al., Recombinant DNA LaboratoryManual, (Acad. Press, 1988)).

Hybridization using allele-specific probes, described by e.g., Saiki etal., Nature 324, 163-166 (1986); Dattagupta, EP 235,726, Saiki, WO89/11548, can be used to determine the presence or absence of apolymorphism by, for example Southern blot, dot blots, etc. Anallele-specific probe can be designed that hybridizes to a segment of anucleic acid molecule from one individual but does not hybridize to thecorresponding segment from another individual due to the presence ofdifferent polymorphic forms in the two individuals. Hybridizationconditions should be sufficiently stringent that there is a significantdifference in hybridization intensity between alleles.

The hybridization pattern of a control and variant sequence to an arrayof oligonucleotide probes immobilized on a solid support, as describedin U.S. Pat. No. 5,445,934, or in WO 95/35505, can also be used as ameans of detecting the presence of variant sequences.

Amplification products generated using the polymerase chain reaction canbe analyzed by the use of denaturing gradient gel electrophoresis(DGGE). Different alleles can be identified based on the differentsequence-dependent melting properties and electrophoretic migration ofDNA in solution. Erlich, ed., PCR Technology, Principles andApplications for DNA Amplification, (W. H. Freeman and Co, New York,1992), Chapter 7.

Alleles of target sequences can be differentiated using single-strandconformation polymorphism analysis (SSCP), which identifies basedifferences by alteration in electrophoretic migration of singlestranded PCR products, as described in Orita et al., Proc. Nat. Acad.Sci. 86, 2766-2770 (1989). Amplified PCR products can be generated asdescribed above, and heated or otherwise denatured, to form singlestranded amplification products. Single-stranded nucleic acids mayrefold or form secondary structures which are partially dependent on thebase sequence. The different electrophoretic mobilities ofsingle-stranded amplification products can be related to base-sequencedifference between alleles of target sequences.

Other methods of detection include mismatch cleavage detection andheteroduplex analysis in gel matrices. These methods are used to detectconformational changes created by DNA sequence variation as alterationsin electrophoretic mobility. Alternatively, where a polymorphism createsor destroys a recognition site for a restriction endonuclease, referredto as restriction length polymorphism, or RFLP, the sample is digestedwith that endonuclease and the products size fractionated to determinewhether the fragment was digested. Fractionation is performed by gel orcapillary electrophoresis, particularly acrylamide or agarose gels.

In one embodiment of the present invention, an array of oligonucleotidesare provided, where discrete positions on the array are complementary toone or more of the provided polymorphic sequences, e.g. oligonulcoetidesof at least 12 nucleotides, frequently 20 nucleotides or larger andincluding the sequence flanking the polymorphic position. Such an arraymay comprise a series of oligonucleotides, each of which canspecifically hybridize to a different polymorphism. For examples ofarrays, see Hacia et al., 1996, Nat. Genet., vol. 14, pages 441-447 andDeRisi et al., 1996, Nat. Genet., vol. 14, pages 457-460. Arrays ofinterest may further comprise sequences, including polymorphisms, ofother genetic sequences, particularly other sequences of interest forpharmacogenetic screening.

It is within the scope of the present invention that one or more CYP2A6polymorphisms provided herein can be detected in a single assay such asa multiplex assay to identify the presence or absence of differentalleles in the same assay, see for example Stuven et al.,Pharmacogenetics, 1996, vol. 6, pages 417-421.

According to the present invention, a polymorphism provided herein isindicative of protection against developing inflammation and/or asthma.The presence of a polymorphism of the present invention in the CYP2A6gene is capable of protecting an individual against developinginflammation and/or asthma. Thus, the absence of a polymorphism of thepresent invention in the CYP2A6 gene is predictive of inflammationand/or asthma development. A polymorphism of the present invention isalso prognostic for habitual smoking. The absence of a polymorphism ofthe present invention in the CYP2A6 gene is predictive of habitualsmoking.

An example of a polymorphism of the present invention, the presence orabsence of which is predictive of inflammation and/or asthma or habitualsmoking in individuals is designated the 743 polymorphism. See Examples4 and 5. The 743 polymorphism results from a single-base mutation ingenomic CYP2A6 DNA at nucleotide position -39 of the promoter region.The nucleotide corresponds to nucleotide 743 in SEQ ID NO:2 1. The 743polymorphism results in a T to G transposition. The 743 polymorphism atthis position occurs in all ethnic groups studied at varyingfrequencies. See Example 2. Coumarin 7-hydroxylase is found in humanlung. Without being bound by theory, Applicants believe that thecoumarin 7-hydroxylase in lung is capable of metabolizing environmentalsubstances that enter the lung. Upon metabolism, the environmentalsubstances can irritate airway tissue resulting in inflammation of suchtissue and/or asthma. The polymorphism at position 743 of SEQ ID NO:21disrupts the expression of the CYP2A6 gene, thereby decreasing theproduction of coumarin 7-hydroxylase. Thus, Applicants believe thatindividuals polymorphic at position 743 produce lower levels of coumarin7-hydroxylase, thereby lowering the production of environmentalirritants in their lungs that can result in inflammation or asthma.

A preferred strategy for analysis entails amplification of a DNAsequence spanning the 743 polymorphism. Amplification of such a sequencecan be primed from forward and reverse primers that hybridize to aCYP2A6 gene on opposite sides of the 743 polymorphism but which do nothybridize to the variant nucleotide itself. That is, for detection ofthe 743 polymorphism, the forward primer hybridizes upstream or 5′ tothe 743 nucleotide and the reverse primer hybridizes downstream or 3′ tothis nucleotide. The forward primer is sufficiently complementary to theantisense strand of a CYP2A6 nucleic acid molecule to hybridizetherewith and the reverse primer is sufficiently complementary to thesense strand of the CYP2A6 sequence to hybridize therewith. The primersusually comprise first and second subsequences from opposite strands ofa double-stranded CYP2A6 DNA sequence. It is particularly important toavoid mismatches in the two nucleotides at the 3′ end of the primer(especially the terminal nucleotide).

For amplification of the 743 polymorphism, forward primers preferablycomprise a segment of contiguous nucleotides from the promoter regionand reverse primers a segment of contiguous nucleotides from the intron1 region.

Preferred primers exhibit perfect sequence identity to CYP2A6 and lessersequence identity to corresponding regions of related genes, such asCYP2A7 and CYP2A13. Such primers are designed by comparison of thewildtype CYP2A6 sequence with corresponding sequences from CYP2A7 andCYP2A13. An exemplary pair of primers for amplifying a segment spanningthe 743 mutation is described in Example 1, such as SEQ ID NO:1 and SEQID NO:2. The amplification product from these primers has a length of988 bp.

Having amplified a segment of a CYP2A6 gene known to span the 743polymorphism, a variety of assays are available for determining whetherthe 743 polymorphism is present that are disclosed herein, preferably,using allele specific primers. For example, selective amplification ofthe wildtype allele of the CYP2A6 allele can be accomplished using aforward primer that has about 10-50, and usually 15-30 nucleotides fromthe wildtype CYP2A6 genomic sequence, including nucleotide 743. Such aforward primer when paired with any suitable reverse primer downstreamfrom nucleotide 743 (i.e., sufficiently complementary to the sensestrand of CYP2A6 to hybridize therewith) can be used to amplifyselectively the wildtype allele without amplifying a mutant allele. The743 nucleotide usually occurs near, or preferably, at the 3′ end of theprimer. The same result can be achieved by using a reverse primer thathas about 10-50 or usually 15-30 contiguous nucleotides from thecomplement of the wildtype CYP2A6 genomic sequence (i.e., the antisensestrand) including the nucleotide at position 743. Such a reverse primercan be paired with any suitable forward primer sufficientlycomplementary to a sequence of the antisense strand of the CYP2A6 geneupstream from nucleotide 743 to hybridize therewith. The 743 nucleotideshould again be at or near the 3′ end of the reverse primer. Forselective amplification of a 743 mutant allele a suitable forward primerfor amplification comprises about 10-50 or usually 15-30 contiguousnucleotides including nucleotide 743 from the mutant CYP2A6 genomicsequence (i.e., the sense strand). The forward primer can be paired withany suitable reverse primer sufficiently complementary to the sensestrand of a CYP2A6 genomic subsequence downstream from nucleotide 743 tohybridize therewith. Alternatively, the same result can be achievedusing a reverse primer comprising about 10-50 or 15-30 contiguousnucleotides including nucleotide 743 from the complement of the mutantCYP2A6 sequence (i.e., the antisense strand). Such a reverse primer canbe paired with any suitable forward primer sufficiently complementary tothe antisense strand of a CYP2A6 subsequence upstream from nucleotide743 to hybridize therewith.

Following amplification, the sample under test is characterized aswildtype or mutant by the presence or absence of an amplificationproduct. With a primer designed for selective amplification of thewildtype allele, the presence of an amplification product is indicativeof that allele and the absence of an amplification product indicative ofa mutant allele. The converse applies for primers designed for selectiveamplification of a mutant allele. In preferred assay, a sample isdivided into two aliquots, one of which is amplified using primers forwildtype allele amplification, the other of which is amplified usingprimers appropriate for mutant allele amplification. The presence of anamplification product in one but not both of the aliquots indicates thatthe individual under test is either wildtype or a homozygous for themutation (depending on aliquot in which the amplification productoccurred). The presence of amplification product in both aliquotsindicates that the individual is heterozygous. The absence of anamplification product in both aliquots would indicate either the absenceof a CYP2A6 gene or a quality control problem in the amplificationprocedure requiring that the assay be repeated. The presence or absenceof amplification products can be detected by gel electrophoresis usingmethods standard in the art or described herein.

One embodiment of the present invention is a diagnostic kit. The kitcomprises useful components for practicing the methods of the presentinvention. The kit typically comprises at least one of the primersneeded for the PCR amplification if PCR amplification is used and alsocontrol DNA suitable for determining the success of the PCR reactionand/or to confirm the identification of the presence or absence of apolymorphism in a sample. A kit usually contains a matched pair offorward and reverse primers as described above for amplifying a segmentencompassing a polymorphism of the present invention. For selectiveamplification of mutant or wildtype alleles, kits usually contain a pairof primers for amplification of the mutant allele and/or a separate pairof primers for amplification of the wildtype allele. Optional additionalcomponents of the kit include, for example, restriction enzymes foranalysis of amplification products, reverse-transcriptase or polymerase,the substrate nucleoside triphosphates, and the appropriate buffers forreverse transcription, PCR, or restriction enzyme reactions. Usually,the kit also contains instructions for carrying out the methods.

The method of the present invention is characterized by detecting thepolymorphisms provided herein, and is useful in gene diagnosis fordetecting CYP2A6 gene polymorphisms.

As long as the method is capable of detecting the aforementionedspecific types of mutation which are clearly defined and characterizedby the present invention, no limitation is imposed on the technique,etc. to be employed in the method. For example, a variety of routinemethods may be widely used. Since the types of gene mutation to bedetected by the present invention are now clarified and specified, itwould be obvious for skilled persons in the art to adopt a suitablemethod for detecting them from the reading of the disclosure of thisspecification.

Also provided by the present invention are methods for detecting apolymorphic sequence of the P450 gene in a sample containing humannucleic acid molecules comprising identifying the presence or absence ofa polymorphism that correlates with the nucleic acid sequence identifiedat positions 202, 369, 394, 413, 743 or 841 of SEQ ID NO:21. In oneembodiment, said method further comprises: (a) mixing said nucleic acidmolecules with one or more second nucleic acid molecules of the presentinvention so as to form a mixture; (b) subjecting said mixture tohybridization conditions; and (c) detecting any hybrids formed.

Those methods wherein said nucleic acid is amplified prior to step (a)are preferred. The materials useful for these methods can be obtained asdescribed, and these methods can be accomplished as discussed. In apreferred embodiment, the second nucleic acid molecule consists of aprimer of the present invention and step (c) is accomplished bydetermining the presence or absence of a PCR product. In anotherpreferred embodiment, the second nucleic acid molecule is a probe of thepresent invention, wherein the probe is labeled with a detectable markerand step(c) is accomplished by determining the presence or absence ofthe detectable marker.

In other embodiments, methods of the present invention comprisedigesting DNA comprising at least a part of the nucleic acid sequencecontaining the polymorphic site with a restriction enzyme that will cut,or will not cut, at or adjacent to one of the polymorphic positionsaccording to whether the polymorphism is present. In this manner, suchrestriction enzymes distinguish between wildtype and mutant alleles.Those methods wherein said nucleic acid is amplified prior to thedigestion step are preferred. The materials useful for these methods canbe obtained as described, and these methods can be accomplished asdiscussed.

Polyclonal and/or monoclonal antibodies that specifically bind tovariant gene products but not to corresponding prototypical geneproducts are also provided. Antibodies can be made by injecting mice orother animals with the variant gene product or synthetic peptidefragments thereof. Monoclonal antibodies are screened as are described,for example, in Harlow & Lane, Antibodies, A Laboratory Manual, ColdSpring Harbor Press, New York (1988); Goding, Monoclonal antibodies,Principles and Practice (2d ed.) Academic Press, New York (1986).Monoclonal antibodies are tested for specific immunoreactivity with avariant gene product and lack of immunoreactivity to the correspondingprototypical gene product. These antibodies are useful in diagnosticassays for detection of the variant form, or as an active ingredient ina pharmaceutical composition.

Another embodiment of the present invention includes a computer fordisplaying a nucleic acid sequence of a molecule of the presentinvention, as broadly described herein. Such a computer includes acomputer-readable medium encoded with one or more of said nucleic acidsequences to create an electronic file. The computer further includeshardware and software that display the nucleic acid sequence in theelectronic file as a linear model of the molecule for analysis,alignment with other sequences or visualization of the nucleic acidsequence by the computer. Such hardware and software components arewell-known in the art. Also provided are databases comprising sequenceinformation pertaining to nucleic acid molecules of the presentinvention.

EXAMPLES Example 1

This example describes the identification of variants of the knowncytochrome P450 2A6 sequence (CYP4502A6).

Blood specimens from 32 individuals were collected after obtaininginformed consent. All samples were stripped of personal identifiers tomaintain confidentiality. The only data associated with the sample wereself-reported gender and racial group designations. Of the 32individuals, 10 were African Americans, 10 were Caucasians, 6 wereJapanese and 6 were Chinese. Genomic DNA was isolated using standardmethods. Polymerase chain reaction amplification of regions of theCYP4502A6 gene were performed using the primers listed in Table 1. Eachpolymerase chain reaction (PCR) amplification was performed in a totalreaction volume of 100 microliters (μl). The final magnesium chlorideconcentration for each reaction was optimized empirically and is shownin Table 1. The final genomic DNA concentration was about 100 nanogram(ng) per reaction from 2 individuals. The PCR reactions were performedusing Perkin Elmer's GeneAmp PCR kit (available from Perkin Elmer,Norwalk, CN) using Taq Gold DNA polymerase according to manufacturer'sinstructions and using the following primers. TABLE 1 PCR Primers andMg++ Concentration Forward/ SEQ Region Reverse ID NO: 5′-3′ [Mg++] 2A6 F1 TTCCCCTGAAATATGG 2 mM 2A6 R 2 CTTCTCCCTGTCTTGG 2 mM

Thermal cycling was performed with an initial denaturation step at 95°C. for 10 min, followed by 35 cycles of denaturation at 95° C. for 30sec, primer annealing at 55° C. for 45 sec, and primer extension at 72°C. for 2 min, followed by final extension at 72° C. for 5 min.

The resulting PCR products were purified using Microcon-100 columns(available from Millipore, Bedford, Mass. PCR products from twoindividuals were combined for each cycle of sequencing. Cycle sequencingwas performed on the GeneAmp PCR System 9600 PCR machine using the ABIPrism dRhodamine Terminator Cycle Sequencing Ready Reaction Kit(available from Applied Biosystems, Inc., Foster City, Calif.) accordingto the manufacturer's directions. Oligonucleotide primers used for thesequencing reactions include SEQ ID NO: 1 and those shown in Table 2.TABLE 2 Sequencing Primers Forward/ SEQ Region Reverse ID NO: 5′-3′2A6(1) F 3 TTCCCCTGAAATATGG 2A6(2) F 4 GCCACACTTTGTCTTACC 2A6(3) F 5TGGGGCTTGTAGTTGG 2A6(1) R 6 CTGTTGTGGAGGATGC 2A6(2) R 7GGTCTGTGGTACTTCAGGAG 2A6(3) R 8 CAATGAAGGGCAATGGAbout 8 μl sequencing reactions were subjected to 30 cycles at 96° C.for 20 sec, 50° C. for 20 sec, and 60° C. for 4 min, followed by ethanolprecipitation. Samples were evaporated to dryness at 50° C. for about 15min and resuspended in 2 μl of loading buffer (5:1 deionizedformamide:50 mM EDTA pH 8.0), heated to 65° C. for 5 min, andelectrophoresed through 4% polyacrylamide/6M urea gels in an ABI 377Nucleic Acid Analyzer according to the manufacturer's instructions toobtain sequence information. All sequences were determined from both the5′ and 3′ (sense and antisense) direction. The 16 electropherograms wereanalyzed by comparing peak heights, looking for about 25% reduction inpeak size and/or presence of extra peaks as an indication ofheterozygosity.

Portions of the CYP4502A6 sequence including a single nucleotidepolymorphism identified from the sequencing are shown below thecorresponding portions of the wildtype sequence, with the position ofthe polymorphism shown in bold, in Table 3. For example, a variation ofa C to a T transition was discovered at base pair -580 in the promoterregion of the CYP4502A6 gene. TABLE 3 Newly Identified CYP4502A6 GenePolymorphisms Position in SEQ Polymorphism Location SEQ ID NO: 21 ID NOSequence Promoter −580 202  9 GAACCCGCTGGGCTT 10 GAACCCGTTGGGCTTPromoter −413 369 11 ACTTTGTCTTACCCTAA 12 ACTTTGTCTCACCCTAA Promoter−388 394 13 GACCTTTGGATTCCTCT 14 GACCTTTGAATTCCTCT Promoter −369 413 15CCCTGGAACCCCCAGATC 16 CCCTGGAACACCCAGATC Promoter  −39 743 17CAGGCAGTATAAAGGCAA 18 CAGGCAGTAGAAAGGCAA Exon 1   51 841 19CCTGACTGTGATGGTCT 20 CCTGACTGTAATGGTCTSEQ ID NO.12 lists the sequence CYP4502A6 gene, including promoter,exons 1 and 2 and correction of some errors that were present in theGenBank Accession No. U22027.

Example 2

This example describes genotype frequencies for a CYP4502A6 promotervariant in different ethnicities.

Genotyping of 32 individuals from each of 4 broadly defined racialgroups (Caucasian, African American, Hispanic and Asian American) forone polymorphism produced the allele and genotype frequencies shown inTable 4. TABLE 4 CYP4502A6 Promoter Variant and Ethnic Frequencies.Allele Racial Group A (wild type) B (mutant) n Caucasian 0.94 0.06 573African American 0.89 0.11 236 Hispanic 0.89 0.11 300 Asian American0.77 0.23  72The results indicate that the variant allele, a guanine residue atposition -39 in the promoter of CYP4502A6, occurs in all ethnic groupsstudied but at different frequencies among the groups.

Example 3

This example describes the comparison of expression of a luciferase geneusing promoter regions comprising CYP4502A6 wildtype promoter sequenceand CYP4502A6 promoter sequence containing a polymorphic site.

Two different recombinant molecules, one containing the promoter regionfrom the wildtype allele and the other from the mutant allele ofCYP4502A6 operatively linked to the luciferase gene transcriptioncontrol sequences were produced as follows. An about 744-nucleotide DNAfragment (SEQ ID NO:22) comprising the CYP4502A6 promoter region fromthe wildtype allele, denoted herein as 2A6WT₇₄₄, was PCR amplified from50 ng of genomic DNA isolated from individuals known to have thewildtype allele, using a sense primer having the nucleic acid sequence5′ AAGCTTAGAAGATGGCAGTGGAG3 ′ (SEQ ID NO:23) that includes a Hind IIIsite at the 3′ end, and an antisense primer having the nucleic acidsequence 5′ GAGCTCGGTGGTAGAGGGATG 3′ (SEQ ID NO:24) that includes a SacI site at the 3′ end. The PCR product was used directly for subcloninginto the TA vector pCR2.1(available from Invitrogen, Carlsbad, Calif.)producing the recombinant molecule p 2A6WT₇₄₄.

An about 744-nucleotide DNA fragment (SEQ ID NO:25) comprising theCYP4502A6 promoter region from the allele containing the polymorphicsite at position −39 of the promoter region, denoted herein as2A6SNP₇₄₄, was PCR amplified from 50 ng of genomic DNA isolated fromindividuals known to have the mutant allele, using sense primer SEQ IDNO:23 and antisense primer SEQ ID NO:24. The PCR product was useddirectly for subcloning into the TA vector pCR2.1 (available fromInvitrogen, Carlsbad, Calif.) producing the recombinant molecule p2A6SNP₇₄₄.

Both the promoter region 2A6WT₇₄₄ and 2A6SNP₇₄₄ were then subclonedseparately into an expression vector containing a luciferase gene.Recombinant molecule p2A6WT₇₄₄luc was produced by digesting 2A6WT₇₄₄with HindIII and SacI restriction endonucleases, column purifying theresulting fragment, and directionally subcloning the fragment intoexpression vector p20LUC (van Zonneveld et al.,1988, PNAS 85:5525-9).Recombinant molecule p2A6SNP₇₄₄luc was produced by digesting 2A6SNP₇₄₄with the same restriction enzymes and directionally subcloning thefragment into the p20LUC expression vector.

Recombinant molecules p2A6WT₇₄₄luc and p2A6SNP₇₄₄luc were eachtransformed into a human carcinoma cell line (available from ATCC), ahuman lymphoblast cell line (available from Coriell Cell Repository) anda Chinese hamster ovary cell line (available from ATCC) using standardtechniques to form recombinant cells HEP-p2A6WT₇₄₄luc cells,HEP-p2A6SNP₇₄₄luc cells, LB-p2A6WT₇₄₄luc cells, LB-p2A6SNP_(7b 44)luccells, CHO-p2A6WT₇₄₄luc cells and CHO-p2A6SNP₇₄₄luc cells, respectively.Each transfection experiment was done in triplicate. Cells were alsotransfected with control vectors including a control vectorconstitutively expressing either a beta-galactosidase gene or adifferent type of luciferase gene. To obtain luciferase expression,recombinant cells were grown for about 24-48 hours after transfection instandard media. Luciferase activity in the recombinant cells was thendetermined by lysing the cells, by taking the supernatant aftercentrifugation, and by adding 100 microliters of luciferase substrate to20 microliters of supernatant followed by an immediate measurement oflight emission using a Turner Design 20/20 luminometer (available fromVWR, Bridgeport, N.J.).

The experiments were repeated at least three times in each of thedifferent cell lines and the results of three independent experimentsare shown in FIG. 1. The results indicate that use of the CYP4502A6promoter region containing the polymorphic site produced about a 3- to5-fold less luciferase activity compared to use of the promoter from thewildtype allele.

Example 4

This example describes the association of the Promoter -39 CYP4502A6polymorphism described in Example 1 with the occurrence of asthma.

Genomic DNA was isolated from blood lymphocytes of 223 individuals withasthma and 256 individuals without asthma using standard methods. Taqmanassays were performed using DNA samples from each individual to identifythe presence or absence of the Promoter -39 CYP4502A6 variant (SEQ IDNO: 18). The following primers were used: PCR Primer SEQ ID NO: PrimerSequence 2A6-39 for 26 TGGGAGGTGAAATGAGGTAATTATG 2A6-39 rev 27GTACCACCATCTCCCTACTATCTAC

PCR amplification was performed at a Mg₂Cl concentration of 5 mM.Thermal cycling was performed with an initial denaturation step at 95°C. for 10 min, followed by 47 cycles of denaturation at 94° C. for 30sec, primer annealing and extension at 62° C. for 60 sec. The resultingPCR products were resolved using standard gel electrophoresis methodsand hybridized to the following probes: 5′ TCAGGCAGTATAAAGGCAAACCACCC 3′(wildtype, SEQ ID NO:28) and 5′ TTCAGGCAGTAGAAAGGCAAACCACC 3′ (mutant,SEQ ID NO:29). The resulting flourescence from the hybridization wasmeasured using a fluorometer to determine the occurrence of thepolymorphic sites and homo- or heterozygosity. The results are shownbelow in Table 5. TABLE 5 CYP4502A6 Variants and Asthma. The chi-squarep-value comparing the observed to expected in this Table is 0.01. Therelative risk of asthma for those with asthma versus those without thevariant (assuming a dominant model) is 0.53 (95% CI 0.30 0.93) with acorresponding p-value = 0.03. The A allele is wildtype and the B alleleis mutant. Alleles (Freq/%) Asthma AA AB BB Total NO 214 37 5 256 83.59%14.45% 1.95% YES 202 21 0 223 90.58%  9.42%   0% Total 416 58 5 479The results indicate that individuals have about a 2-fold decreased riskof developing asthma if they have the B allele than if they arehomozygous for the A allele. Thus, the data suggests that the Promoter−39 variation has protective value against developing asthma. Inaddition, the data indicates that being homozygous for the A allele haslittle predictive value.

Example 5

This example describes the association of the Promoter −39 CYP4502A6polymorphism described in Example 1 with the occurrence of smoking.

Results from the PCR reactions performed in Example 3 to identify thepresence or absence of the Promoter −39 CYP4502A6 variant (SEQ ID NO:18)were correlated with smoker versus non-smoker information from eachindividual. The results are shown below in Table 6. TABLE 6 CYP4502A6Variants and Smoking. Alleles Smoking Status AA AB BB Total >1 pack/day 36  3 0 39 92%   8%   0% >5 cigs-<1 pack/day 138 18 1 157 88% 11.5%0.5% <5 cigs/day  43 11 1 55 78%   20%   2%The results indicate that there is a trend toward increased cigaretteconsumption among those individuals without the variant at a statisticalsignificance of p=0.1. There was no difference in variant status amongthose individuals who smoked compared with those who never smoked.

Example 6

This example is a comparison of the wildtype sequence for cytochromeP450 2A6 (CYP4502A6) reported in the literature and having GenBankAccession No. U22027 with the wildtype sequence for cytochrome P450 2A6(CYP4502A6) of the present invention and being identified as SEQ IDNO:21.

A comparison of GenBank Accession No. U22027 and SEQ ID NO:21 is shownbelow in Table 7. TABLE 7 Position as referenced by Nucleotide ofGenBank Accession SEQ ID NO: 21 Difference SEQ ID NO: 21 No. U22027  75substitution C G 141 insertion A — 144 substitution A T between 198deletion — G and 199 423 substitution A G  78 substitution A T 828substitution T C

Those skilled in the art will appreciate that numerous changes andmodifications may be made to the preferred embodiments of the inventionand that and that such changes and modification may be made withoutdeparting from the spirit of the invention. It is therefore intendedthat the appended claims cover all such equivalent variations as fallwithin the true spirit and scope of the invention.

1-35. (canceled)
 36. A method of detecting the presence of a Gpolymorphism at promoter position −39 of a CYP2A6 gene in an individual,the method comprising: (a) obtaining a nucleic acid sample from theindividual; and (b) amplifying a region of the CYP2A6 gene using aforward primer that hybridizes to the gene upstream of promoter position−39 and a reverse primer that hybridizes to intron 1 of the gene togenerate an amplification product; and (c) assaying the amplificationproduct to determine whether the individual has a G polymorphism atpromoter position −39 of the CYP2A6 gene.
 37. The method of claim 36,wherein assaying comprises determining whether the individual ishomozygous or heterozygous for the G polymorphism.
 38. The method ofclaim 36, wherein the reverse primer has a nucleotide sequencecomprising contiguous nucleotides from intron
 1. 39. The method of claim36, wherein the forward primer comprises SEQ ID NO:1 and the reverseprimer comprises SEQ ID NO:2.
 40. A method of detecting the presence orabsence of a G polymorphism at promoter position −39 of a CYP2A6 gene inan individual, the method comprising: (a) obtaining a nucleic acidsample from the individual; and (b) mixing the nucleic acid sample withan allele-specific primer, specific for a G polymorphism or a T atpromoter position −39 of the CYP2A6 gene, such that upon hybridization apolymerase-mediated extension product forms if an allelic formcomplementary to the allele-specific primer is present; and (c)analyzing the extension product to identify the presence or absence ofthe G polymorphism at promoter position −39.
 41. The method of claim 40,wherein the method further comprises contacting the nucleic acid samplewith a second primer which hybridizes to a distal site such that anallele-specific amplification product is formed.
 42. The method of claim40, wherein analyzing comprises determining whether the individual ishomozygous or heterozygous for the G polymorphism.
 43. A method ofdetecting the presence of a G polymorphism at promoter position −39 of aCYP2A6 gene in an individual, the method comprising: (a) obtaining anucleic acid sample from the individual; and (b) generating anamplification product from the nucleic acid sample specific for a regionof the CYP2A6 gene spanning promoter position −39, wherein the amplifiedregion lacks exonic sequences of the CYP2A6 gene; and (c) analyzing theamplification products to identify the presence or absence of the Gpolymorphism
 44. The method of claim 43, wherein analyzing comprisesdetermining whether the individual is homozygous or heterozygous for theG polymorphism.
 45. The method of claim 43, wherein an allele-specificprimer for promoter position −39 of the CYP2A6 gene is used to amplifythe region.